首页> 外文OA文献 >Normalising orthographic and dialectal variants for the automatic processing of Swiss German
【2h】

Normalising orthographic and dialectal variants for the automatic processing of Swiss German

机译:标准化正字和方言变体以自动处理瑞士德语

摘要

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication. Despite this fact, they lack tools and resources for natural language processing. The main reason for this is the fact that the dialects are mostly spoken and that written resources are small and highly inconsistent. This paper addresses the great variability in writing that poses a problem for automatic processing. We propose an automatic approach to normalising the variants to a single representation intended for processing tools’ internal use (not shown to human users). We manually create a sample of transcribed and normalised texts, which we use to train and test three methods based on machine translation: word-by-word mappings, character-based machine translation, and language modelling. We show that an optimal combination of the three approaches gives better results than any of them separately.
机译:与大多数标准化语言的方言不同,德语的瑞士方言广泛用于日常交流中。尽管有这个事实,但他们缺乏用于自然语言处理的工具和资源。造成这种情况的主要原因是,方言多为口语,书面资源很少且高度不一致。本文解决了写作中的巨大差异,这给自动处理带来了问题。我们提出了一种自动方法,将变体标准化为用于处理工具内部使用的单一表示形式(未向人类用户显示)。我们手动创建了一个转录文本和标准化文本样本,用于训练和测试基于机器翻译的三种方法:逐词映射,基于字符的机器翻译和语言建模。我们表明,三种方法的最佳组合比单独使用任何一种方法都能得到更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号